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Abstract 

^vq 1 We study the classic Vehicle Routing Problem in the setting of stochastic optimization with 

recourse. StochVRP is a two-stage optimization problem, where demand is satisfied using two routes: 
1 fixed and recourse. The fixed route is computed using only a demand distribution. Then after 

observing the demand instantiations, a recourse route is computed - but costs here become more 
^ 1 expensive by a factor A. 

We present an 0(log 2 n • log(nA))-approximation algorithm for this stochastic routing problem, 
under arbitrary distributions. The main idea in this result is relating StochVRP to a special case of 
submodular orienteering, called knapsack rank-junction orienteering. We also give a better approxi- 
mation ratio for knapsack rank-function orienteering than what follows from prior work. Finally, we 
provide a Unique Games Conjecture based lu(1) hardness of approximation for StochVRP, even on 

■ star-like metrics on which our algorithm achieves a logarithmic approximation. 

, ^; 1 Introduction 

Consider a distribution problem involving a depot location and a set of customer locations. There is 

■ a vehicle of capacity Q that is used to distribute items. The demand at customer locations is random 
Q\ . with a known (joint) distribution T>. The distributor wants to plan a fixed route for this capacitated 

vehicle, that will be employed on a daily basis. However due to the stochastic nature of demands, the 
fixed route might be insufficient to meet all demands. Therefore the distributor also plans a secondary 
recourse strategy, that satisfies all unmet demands after the fixed route. Each morning the distributor 
fSj | receives the precise demand quantities from all customers (drawn from £>). Based on this he/she decides 
which subset of customers will be satisfied along the fixed route, and then plans a recourse route to 
satisfy the remaining customers. The goal is to minimize the cost of the fixed route plus the expected 
cost of the recourse route. Examples of real-world applications are local deposit collection from bank 
branches, garbage collection, home heating oil delivery, and forklift routing 0, Q . 

A solution based on fixed routes is desirable for several reasons, and is commonly used in practice; 



see [30, 15| for more detailed discussions on this. In our context, there are at least two advantages. First, 
the driver can get familiar with the road/traffic conditions which results in time savings. Moreover, 
having fixed routes simplifies the everyday route planning process: the incremental recourse step will 
typically contain fewer demands. 

Fixed-route problems are often modeled in the framework of two-stage stochastic optimization. 
A priori optimization handles some natural but simple recourse strategies: eg., short-cutting over 
customers without demand in TSP @, |32fl , and refill- visits from the depot in the Vehicle Routing 
Problem (VRP) |I9| . Recently, more complex recourse actions have been considered: adding penalty 
terms in deadline TSP H , and using backup vehicles in VRP pi] . 
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In this paper, we penalize the cost of the recourse route by an inflation factor A > 1. This is also a 
common approach for two-stage stochastic optimization with recourse. Furthermore, in the stochastic 
VRP we consider, recourse strategies are non-trivial since it also involves choosing the subset of realized 
demands served by the fixed route. In this respect it is unlike most previously studied 2-stage stochastic 
. |2C|) where the recourse step is just a deterministic instance of the same problem. 
Before describing the results of this paper, we define the deterministic and two-stage stochastic VRP 
below. 

Vehicle Routing Problem (VRP). There is a vehicle of capacity Q, metric (V, d) with root/depot 
r € V and demands {q v < Q} v &v- The goal is to find a minimum cost tour of the vehicle that delivers q v 
units to each v € V. The demands are "unsplittable" , i.e. the demand at any vertex must be satisfied in 
a single visit. Any VRP solution corresponds to a sequence of round-trips from the depot, where at most 
Q units of demands are served during each round-trip. It is well-known that an a-approximation 
ratio for TSP implies an (a + 2)-approximation algorithm for VRP. 

Two-stage Stochastic VRP (StochVRP). The setting is same as above, with a capacity Q vehicle, 
metric (V, d) and depot r G V . Here the demands {q v }vev are random variables given by a joint demand 
distribution T> on {0, 1, . . . , Q} v , available as a black-box that can be sampled from. We are also given 
an inflation parameter A > 1. The goal is to compute a fixed route solution with a recourse strategy. 

• In the first stage the algorithm computes a fixed tour r, without knowledge of the actual demand. 
The tour r consists of several round-trips from the depot: each round-trip is a cycle containing 
r (henceforth called r-tour). We represent r as a concatenation {ti,...,tf} of r-tours. It is 
important to note that r only represents the vehicle route, and does not specify demand deliveries 
(this will be decided after demand instantiations). In particular, a vertex v may appear in multiple 
r-tours of r; and even if v appears in r the instantiated demand at v may not eventually be satisfied 
by r. 

• In the second stage, the demands q are instantiated from T>. Knowing this, an algorithm chooses 
to satisfy subset q A C q of demands using the fixed tour r, subject to the vehicle capacity of Q. 
That is, for each r-tour {ri}f =1 the algorithm chooses a subset Si C n of vertices to serve, where 
^2 ve Si~Qv < Qi an d sets qA = {Qv '■ v £ uf =1 Si}. Then the algorithm computes a recourse tour 
a meeting all residual demands q B = q\~q~A- That is, a is a solution to the deterministic VRP 
instance with demands {q v : v G V \ Li[ =1 Si}. 

Note that the demands q A satisfied by the fixed tour r differs based on the instantiation q; however 
the route taken by the vehicle stays fixed. So the first stage cost is just the length d(r) of the fixed 
tour. The recourse tour a clearly depends on the demand instantiation. The second stage cost under 
demand q is A • d(cr(q)), the length of the recourse tour inflated by a parameter A. The objective in 
StochVRP is to minimize the expected total cost: 

d(r) + A • Eg<_£> [d(a(q))] 

For any integer / > 0, we let [I] := {1, . . . ,/}. For a given StochVRP instance, opt will denote its 
optimal value. We let n = \V\ denote the number of vertices in the metric and D = max Ui „ d(u,v) the 
diameter of the metric. 

Our Results, Techniques and Outline. In this paper we show: 

Theorem 1.1 There is a randomized O (log 2 n -log(nA))- approximation algorithm for StochVRP under 
arbitrary distributions. 



problems (eg. [ESJ, 
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Using a sampling-based reduction [10| we show (in Subsection 3.1) that the objective value under any 



black-box distribution can be well-approximated by another demand distribution having support size 
m = poly(n, A). 

Then, in Section [2] we present an 0(log 2 n • log(nm))-approximation algorithm for StochVRP where 
m is the support size of the distribution. This is a set-cover type algorithm that uses the submodular 
orienteering problem JT2L [?J as a subroutine. In the submodular orienteering problem there is a metric 
(V,d) with root r, length bound B and monotone submodular function / : 2 V — > R+; and the goal is 
to find an r-tour of length at most B visiting some subset S C V of vertices so as to maximize f(S). 
Direct use of algorithms from [|ll], [?]] yields an approximation ratio worse than Theorem LI by a factor 



of log e n. Instead we give a better result for submodular orienteering on objective functions of the type 
encountered in StochVRP, called knapsack rank-function orienteering (KnapRankOrient). In particular, 
we consider the ratio KnapRankOrient problem where instead of the length-bound, the objective is to 
maximize the ratio of function value to the length. 

Theorem 1.2 There is a deterministic O (log 2 n)- approximation algorithm for ratio knapsack rank- 
function orienteering. 



The main idea here is to use LP rounding techniques for the related group Steiner problem [O, 26 1, 



augmented with an alteration step (for the analysis). While alteration has been widely used with LP- 



rounding, eg. |35], we are not aware of an application in context of the group Steiner tree problem. This 
step only bounds the function- value and length in expectation (separately). In order to bound their 
ratio, we adapt the group Steiner derandomization from Charikar et al. Q to our context. We defer 
further discussion and details on KnapRankOrient to Section 

Combined with the sampling-based reduction this suffices to approximate the objective value of 



StochVRP under black-box distributions, satisfying the guarantee in Theorem 1.1. However, more work 
is required in order to provide an approximate solution. This is because the recourse step in StochVRP 
is quite non-trivial, and a solution must specify an algorithm to construct the recourse tour for any 
possible demand (not merely the m sampled points). It turns out that the recourse step corresponds 
to solving an "outlier" version of VRP. Although this problem does not admit any true approximation 
ratio (by a relation to generalized assignment p7fl), in Section S we give an LP-based O(l) bicriteria 



approximation: this suffices for Theorem 1.1. 



Our second main result is a UGC-based hardness of approximation: 

Theorem 1.3 Assuming the Unique Games Conjecture, it is NP-hard to approximate StochVRP to 
within a constant factor, even on star-like metrics. 

This is proved in Section |E| and involves a reduction from the vertex cover problem on /c-uniform 
hypergraphs: we use a result by Bansal and Khot Q which says that it is UGC-hard to distinguish 
between the (yes) case when the hypergraph is almost fc-partite and the (no) case when any vertex cover 
is almost the entire vertex-set. We remark that this super-constant hardness holds in star-like metrics, 
where our algorithm achieves an 0(log(nA))-approximation. Our algorithm loses additional log-factors 
in going from (i) stars to trees, and then (ii) trees to general metrics: these overheads are similar to the 
best known results for the related group Steiner tree problem [|T^], 

Finally, we consider the special case when demands are independent across vertices. Using a different 
algorithm we obtain a better ratio in Section |6|. 

Theorem 1.4 There is a randomized 0{ ^° j^T^jn )- approximation algorithm for StochVRP under inde- 
pendent demand distributions. 
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We show that in this case we can enforce a certain solution structure, while losing an 0( ^°f^^^ ) 
factor in the optimal value. Specifically, we show that the demands can be partitioned into two groups: 
one where each demand is (almost always) served by the fixed tour, and another where each demand is 
served in the recourse tour. Then we use an LP-based algorithm to find the best such partition, losing 
another constant factor. We leave open the possibility of a constant approximation in the independent 
demands case. 



Related Work. The VRP [37] is an extensively studied routing problem that combines aspects of 
both TSP and bin-packing. Several stochastic variants of the basic problem have received attention, 
eg. [36, |], |l4|, p], Approximation algorithms for VRP with independent stochastic demands (in the 
a priori model) were given in [Q, [OJ. This paper takes a different approach, that of two-stage stochastic 
optimization with recourse (along the lines of [23, 26, [34], |2(| etc). To the best of our knowledge no 
prior approximation results are known for vehicle routing problems in this model. 

Stochastic optimization Q is a broad area dealing with probabilistic input. Approximation algo- 
rithms for two-stage stochastic problems were introduced by Immorlica et al. |23|] and Ravi-Sinha [p9[ . 
Gupta et al. [^(J and Shmoys-Swamy [34] gave general frameworks for approximating a number of 
stochastic optimization problems; the former result is combinatorial using certain cost-sharing prop- 
erties, whereas the latter is LP-based. However, these approaches do not seem directly applicable to 
StochVRP. The results in |2^, 34] hold in the most general distribution model, where an algorithm only 
receives independent samples from a black-box. Charikar et al. |L0| showed that any arbitrary distri- 
bution can be reduced to one having polynomial support (under certain conditions). We also make 
use of this result in proving Theorem 1.1. For most other combinatorial optimization problems that 
have been considered in the two-stage stochastic model (with proportional cost inflation), it has been 
observed that approximation ratios are the same order of magnitude as the underlying deterministic 
problem [ 



29, 2C, |34], 33]. A notable exception is minimum cost max-matching [24], for which an 



fi(logn)-hardness of approximation was shown. In the case of VRP Theorem 1.3 shows (under UGC) 
that the stochastic approximation ratio is necessarily worse than its deterministic counterpart, even in 
very special metrics. 



2 Algorithm for Polynomial Scenarios 

Here we consider the case when the demand distribution T> is specified as a list of possible outcomes. 
Later on we show how the general case of a black-box distribution can be reduced to this case. Formally 
T> is a multiset {q 1 , . . . , q" 1 } where the actual demand q = q 1 (for some i € [m]) with probability 1/m. 

The main idea of our algorithm is to recast the problem as an instance of set-cover with an ex- 
ponential number of sets. Then we show that the greedy subproblem is an instance of submodular 
orienteering (SOP) for which a poly-logarithmic approximation is known jj], [12). In fact, for the type of 
SOP instances obtained from StochVRP we give a better approximation ratio in Section ||. Altogether, 
this implies Theorem [O] for polynomial scenarios. 

Set cover instance Z. The groundset U consists of tuples (i,v) for all scenarios i £ [m] and vertices 
v € V, which denotes q l (v) demand units at v under scenario i. For any (i,v) £ U we use q((i,v)) := 
cf (i>), and for any subset S C U, q(S) := YlteS Instance I has the following two types of sets: 

1. S := U'^Si is a first stage set iff Si C {(i,v) : v £ V} and q{Si) < Q for all i € [m). The cost 
of this set S is the minimum length of an r-tour that contains all the vertices represented in S. 

2. For any scenario i € [m], T C {(i, v) : v £ V} is a second stage set iff q(T) < Q. The cost of set 
T is A/m times the minimum length of an r-tour containing all vertices of T. 
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Lemma 2.1 The set cover instance X is equivalent to StochVRP. 
Proof: Recall that any feasible StochVRP solution is specified by: 

• The fixed tour r. It will be convenient to view this as a collection . . . ,tf} of r-tours, each of 
which is a round-trip from the depot. 

• For each scenario i € [m], the demands q l A C q l satisfied by the fixed tour. Again this is viewed 
as follows: for each r-tour {Tj}J =1 , Sij C {{i,v) : v £ V} denotes the demands satisfied in Tj. 
Note that by definition, Ujg[f] = 9a- Also due to the capacity constraint, q(Sij) < Q for each 

• For each scenario i € [m], the recourse tour o~i which satisfies residual demands q l \q A - Again we 
view this as a collection {cr^i, . . . , cr^} of r-tours. For k € [Li] let C {(i,v) : v S V} denote 
the demands satisfied in a^}.. Clearly, Ufeg[Li] ^i,k = <f \Qa- Again q^T^k) < Q for all k £ [Li]. 

Note that corresponding to each first stage r-tour Tj, the set Ufci ^ij * s a va fid /irsi stage set in X 
since for all i € [m] (a) Sij C {(i, v) : v €L V} and (b) q(Sij) < Q. Moreover the cost of this set in 1 is 
at most d{jj). 

Similarly, for each scenario i € [m] and second stage r-tour an- (k € [Li]), set Tj ^ is a valid second 
stage set. The cost of this set in I is at most A . d{o~i^)- 

Finally, these sets cover C7 in X since for each scenario i 6 [m], we have: 



(Uf =1 5y) |J (u fee[Li] T 4ifc ) = : u G V} 



The total cost of this solution to T is at most: 

F ^ m Li 

E^) + --EE^m = d ^ + A-v^W))]. 

j=l i=l k=l 

which is just the StochVRP objective value. The reverse relation (from X to StochVRP) can be shown 
in a similar manner, and the lemma follows. ■ 
Thus it suffices to solve the set cover instance X. We use the greedy algorithm for set cover which 
requires solving the following max-coverage subproblem: given U' C U find a set (of either first/second 
type) that maximizes the ratio of the number of iJ'-elements it covers to its cost. We give separate 
algorithms for this problem, under the two types of sets. 

Max- coverage for second stage sets. We give a constant approximation in this case. Assume that 
the algorithm knows by enumeration (i) the cost B of the best ratio set (up to a factor two) , and (ii) the 
scenario i € [m] corresponding to it. Then it suffices to find a set T C U' f){(i,v) : v € V} maximizing 
|T| such that q(T) < Q and cost(T) < B. By the definition of second stage sets, this reduces to finding 
an r-tour visiting the maximum vertices W C {u € V : (i,u) € U'}, having length at most y • B 
and with ^«eiy Q l ( u ) — Q- This is just an instance of the knapsack- orienteering problem, for which a 



constant-factor approximation is known [18|. 



Max- coverage for first stage sets. In this case, we obtain a poly-logarithmic approximation. Again, 
we assume that the algorithm knows the cost B of the best ratio set (up to a factor two). Recall that 
unlike the previous case, one first stage set can cover elements from several scenarios. By definition, 
each first stage set S corresponds to an r-tour visiting vertices W C V and subsets Si C {(i,v) : v S W} 
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for each i € [m] such that {q(Si) < Q}YL\ an( A S = Ui=i Among all first stage sets visiting a fixed 
vertex-set W C V, the maximum coverage of U' equals: 

m 

f(W) := ^ max m| : Si Q {u eW : (i,u) eU'}, ^ql < Q 
i=i [ ^eSi 

For each i € [m] let /i(W) denote the term inside the above summation. Recall that the cost of all first 
stage sets visiting vertices W is the same, namely the minimum TSP on {r} U W. Thus the subproblem 
we wish to solve is: 

max f{W) : there is an r-tour visiting W C V of length < B. (1) 

Recall the submodular orienteering problem (SOP) where given metric (V,d) with root r, bound B 
and submodular function g : 2 V — > R + , the goal is to find an r-tour visiting some subset W C V of 
vertices, having length at most B that maximizes <?(W). If / were submodular then we can use the 
algorithm (?], [l^] to solve this. But / is not submodular. Still, we show below that it can be well 
approximated by a submodular function g. 

We approximate each fi (point-wise) by a submodular function gi. Let Vi := {u G V : (i,"u) G f/'} 
denote the vertices appearing with scenario i in U'. Define: 

gi(W) := max < ^ x v : ^ c? v ■ x v < Q, < x v < 1, Vt> G W 

Observe that gi(W) is just an LP relaxation for a maximization {0, l}-knapsack problem. So its value 
is given by the greedy algorithm that increases x v (up to 1) in increasing order of {q^, : v G Vi D V^}. 
On the other hand, fi(W) is the value of the same integral knapsack problem. Now, function gi can be 
rewritten as the rank function of a polymatroid which is submodular; see eg. Moreover, the 

integrality gap of the natural LP for max-knapsack is two. Thus, 

Claim 2.2 is monotone submodular and Si^l < f i (yv) < gi(W), VW C V. 

So if we define g(W) := YHL\9i^W) then it is submodular and maximizing g in ([j]) is equiva- 
lent to maximizing / (up to factor two). Hence, assuming a /^-approximation algorithm for SOP, we 
obtain a 2p-approximation algorithm for ([I]). This suffices to give an 0(p)-approximation for the max- 
coverage subproblem. We have p = 0(log 2+<E n) in polynomial time using the bicriteria approximation 
in Calinescu-Zelikovsky Q, and p = O(logn) in quasi-polynomial time using the true approximation in 
Chekuri-Pal [12|. In Section || we directly consider the ratio objective corresponding to (|]), called ratio 



knapsack rank-function orienteering, i.e. 

max I ^—77-^— : r is an r-tour visiting vertices V(r) X , 
I d ( T ) J 

and give an improved polynomial time 0(log 2 n)-approximation algorithm for it. 

Finally, we lose an additional log \ U\ = O (log (mn)) factor to solve the set cover instance I (which 
is equivalent to StochVRP). Thus we obtain: 

Theorem 2.3 There is a polynomial time 0(log 2 n • log(nm))- approximation algorithm for StochVRP 
for a polynomial number m of scenarios and n vertices. This ratio improves to 0(logn • log(nm)) in 
quasi-polynomial time. 
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3 Algorithm for General Distributions 



In this section we prove Theorem LI under an arbitrary distribution T> that is accessed by sampling. 
We denote the input StochVRP instance by J . In Subsection ^T] we apply a sampling-based reduction 
from to obtain an equivalent StochVRP instance J' with m = poly(n, A) scenarios. This allows us to 
apply the algorithm from the previous section to approximate the optimal value of instance J . However 
a solution to J must also specify a valid recourse strategy for every outcome q G V, and not just for 
the m outcomes in instance J' . It turns out that the recourse step is captured by an "outlier" version 



of VRP, and we give an LP-based constant-factor bicriteria approximation for it in Subsection 3.2 



3.1 Sampling Based Reduction to Polynomial Scenarios 

Here we show that sampling can be used to reduce an arbitrary demand distribution to one having a 
polynomial number of scenarios. 

Given a fixed tour r and scenario q G T>, let h(r,q) denote the minimum cost of a recourse tour. 
Note that computing h(r,q) involves choosing a subset q A of q to be served by r (at zero cost, but 
subject to capacity) and then optimally solving the VRP instance with demands q — q A an d lengths 
inflated by factor A. Thus we can express the minimum objective value for a given fixed tour r as: 



obj(r) 



d(r) + E^ v [h(T,q)} 



The optimal value of StochVRP instance J is then min Tg x obj(r), where X denotes the set of all 
possible fixed tours. Consider drawing m independent samples {q , . . . ,(f"} from T>, and let J' denote 
the (random) instance of StochVRP with these as explicit scenarios. Define: 



obj(r) 



in 



d(r) + —■) h(T,q l ), V fixed tour r G X. 



i=i 



It is clear that min rg x obj(r) is the optimal value of J' . We now use the result of Charikar et 
al. [1C] to relate these two instances. For completeness we give a proof adapted to our context. Let 
D — niax M ^ d{u, v) denote the diameter of the metric; we assume WLOG (by scaling) that all distances 
are integral. 



Theorem 3.1 ( fllQ| ) Using m = 6(A 2 D 2 n 2 log \X\), with probability 1 - o(l), 

|obj(r) - obj(r)| < 1, for all r G X. 

Proof: Fix any fixed tour r G X. Define H = Eg^_x> h(r, q) and random variables Hi := h{T,q l ) for 
i G [m]. Clearly E,Hi = H for all i G [m]. Note that Hi < 2XnD: the worst case recourse action involves 
a separate round-trip to each vertex. Since Hi/(2XnD) are independent [0,1] random variables, by 
Chernoff bound fl2l 



Pr 



H 



i=l 



> e 



< 2 exp 



2 

e • m 
4X 2 n 2 D 2 



Ve > 0. 



Now observe that obj(r) = d(r) + H whereas obj(r) = d(r) + — • X^i=i-^«- Thus using the above 
inequality with e = 1 and m = 8X 2 n 2 D 2 ■ log \X\, we obtain: 



Pr 



|obj(r) - obj(r)| > 1 



< 



\X\ 2 ' 



Vr G X. 
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Finally, a union bound over all X implies the theorem. 
We now show that m = poly(n, A). 



Claim 3.2 WLOG the number of r-tours in any fixed tour is at most n. Hence the number of edges 
used in the fixed tour is at most n 2 , and \X\ < 2 n . 

Proof: Consider an arbitrary fixed tour r consisting of r-tours t\, . . . ,Tp. Suppose that F > n: then 
we will show there exists another fixed tour r' with at most n r-tours such that obj(r') < obj(r). Let us 
number vertices so that the depot is numbered and d(0, 1) < d(0, 2) < • • • < d(0, n). For each j E [F] 
let M(j) E [n] denote the maximum numbered vertex in r-tour Tf, note d(jj) > 2 • d(0, M(j)). Choose 
k E [n] as the minimum value so that \M~ 1 ({k, . . . ,n})\ < n — k; if there is no such value then set 
k = n + 1. 

Let G = M _1 ({£;,..., n}) C [F\; note that G = when k = n + 1. By choice of k, we have 
|G| <n-k + l. 

Also by choice of k, it follows that |M -1 ({i, . . . , n})| > n — t + 1 for all t < k — 1. Thus there is an 
injective map : [k — 1] — > [F] \ G such that M((p(v)) > v for all v E [k — 1]; this can be obtained say 
by a greedy assignment starting from k — 1 (recall |G| < n — k + 1). Due to the vertex numbering (and 
definitions of <fi, M) we have 2 • d(0, v) < 2 • d(0, M o 4>{v)) < d(T^^) for all v £ [k — 1]. And since (j) is 

injective, 2 • d(0, u) < Ej G [F]\G d ( T i)- 

Set r' to consist of the following r-tours: (a) all singleton r-tours (0, v,0) for -u E [k — 1], and (b) 
{tj : j E G}. Using the above inequality, d(r') < d(r). Observe that vertices {1, . . . , k — 1} will play 
no role in the second stage under r', since they are already individually covered in a first-stage r-tour. 
Moreover, for any vertex v E {k, . . . ,n}, the set of r-tours containing v is identical in both r and r'. 
Hence for any scenario q E T>, the recourse action (for vertices {k, . . . , n}) under r is also feasible under 
t'. This implies hir' ,q) < h(r,q) for all q E V; and so obj(r') < obj(r). Also by construction, the 
number of r-tours in r' is k — 1 + |G| < n. ■ 

Claim 3.3 WLOG, the number of edges used in any recourse tour is at most 2n. 

Proof: Note that any recourse tour (under any outcome q) is a solution to some deterministic VRP 
instance. Since there are at most n demands, and we consider unsplittable routing, it is clear that the 
number of edges used is at most 2n. ■ 

So far we have shown |A"| < 2 n . Next we will show that D = poly(?i), which suffices to prove 
m = poly(n, A) in Theorem |3.l| . 

Assume (by enumeration) that the algorithm knows an upper bound B on the optimal value of 
instance J (up to factor two), i.e. B /2 < opt(J") < B. Let U C V denote the vertices at distance at 
most B from r. Clearly, the optimal fixed tour does not visit any vertex outside U (otherwise it incurs 
cost larger than 2B). So we may always defer demands at V \ U to the second stage (which is what 
opt does). And by using the 0(l)-approximation algorithm ^ for deterministic VRP to serve V \ U, 
the cost incurred by our algorithm on V \ U is at most 0(1) • B. Now we can focus on the StochVRP 
instance restricted to vertices U. 

Consider the following modification to the metric (U, d) : contract all edges of length smaller than 
B/n 5 and let (W, €) denote the resulting metric of shortest-path distances. We consider the natural 
StochVRP instance J" on (W,£) (where T> induces the demand distribution on W as well). The useful 
property of metric i is that it has maximum distance < 2B and minimum distance > B/n 3 ; so by 



scaling we obtain that it has diameter at most 0(n 3 ). Thus we can apply Theorem 3.1 to instance J" 
and m = poly(n, A) samples would suffice. The following lemma relates the J" to the original instance 
J. 
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Lemma 3.4 The optimal value opt( l 7") < B. Moreover, any solution to J" yields a solution to J 
with at most a constant factor increase in objective. 

Proof: The first part of the claim is trivial since J" is obtained from J by contracting the metric: 
so opt(j7 r ") < op\(J) < B. For the other direction, consider any solution to J": we now describe the 
solution corresponding to this in J . Note that each vertex w G W corresponds to some subset U w C U 
such that there is a spanning tree on U w in metric d with each edge of length at most B/n 3 . So whenever 
vertex w G W is visited in J r ", we will visit all vertices of U w along an Euler tour of MSTd(U w ): this 
results in a cost increase of at most 2|[/ u ,| • B/n 3 = 0(B/n 2 ). Note also that each edge e in metric 
(W, £) corresponds to some path in metric (U, d) of length at most £ e + n • B/n 3 : so each edge traversal 



causes a cost increase of at most B/n . By Claim 3.2 the cost increase in the fixed tour is at most 



n ■ 0(B/n ) = 0(B). Similarly, using Claim ^2 the cost increase in the recourse tour (under any 
outcome) is at most 2n • 0(B/n 2 ) < O(B); so the increase in the expected cost of the recourse tour 
is also 0(B). Thus any ^"-solution corresponds to a ^-solution where the increase in objective is at 
most 0(B) = 0(1) -opt(J). U 
Algorithm [O] summarizes the StochVRP algorithm. 

Algorithm 3.1 Algorithm for StochVRP under black-box distributions 
Input: StochVRP instance J = {(V,d), r, Q, A, V). 
1: Guess (by enumeration) value B such that B/2 < opt(.T') < B. 

2: Restrict instance to vertices U = {v € V : d(r,v) < B}. Vertices V \ U are handled separately, 

always in the recourse tour (costs 0(B) in expectation). 
3: Modify metric (U,d) to (W,£) by contracting edges shorter than B/n 3 and recomputing shortest 

paths. By scaling, D = diameter (W, £) < 0(n 3 ). J" is the induced StochVRP instance on (W,£). 
4: Apply Theorem |3.1| to J" to obtain a (random) instance J' of StochVRP with explicit scenarios 

and m = poly(n, A). 

5: Obtain fixed tour r for J' using the algorithm in Section |2[ Theorem 3^ implies that this is a fixed 

tour for J" with increase in objective being at most one w.h.p. 
6: By Lemma 3^4, r is also a fixed tour for J . 
7 
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Output the fixed tour for J containing five copies of each r-tour in r. 



Output recourse action (for any q) as Aug(r, q) given in Algorithm 3.2 . 



3.2 Specifying Recourse Actions 

The recourse strategy involves the following outlier VRP problem: given a fixed tour r (as collection 
{ti, . . . , Tp} of r-tours) and outcome q G {0, . . . , Q} v , find 

• a subset of vertices whose demands q^ C q can be served by the existing route r, subject to the 
capacity constraint of Q on its r-tours; and 

• a minimum cost VRP solution to the residual demands q — ~q~A- 



The optimal value of this instance is exactly function h(r,q) defined Subsection 3.1. We remark that 
when the capacity Q = 1, the outlier VRP problem can be solved exactly using a minimum cost flow 
formulation. When the fixed tour r = we obtain the usual VRP, which is NP-hard for Q > 3. Another 



special case of outlier VRP is the restricted assignment problem [27]. This occurs when V denotes the 



set of jobs with sizes q, there are F machines, and potential job-machine assignments are given by r 



9 



(job v can be assigned machine j iff v G Tj); there is an assignment of makespan Q iff the outlier VRP 
optimum is zero. So it is NP-hard to obtain any true approximation ratio for outlier VRP. Instead 
we give an (0(1), O(l)) bicriteria approximation algorithm, which suffices to obtain an algorithm for 



StochVRP with only constant-factor increase over Theorem 2.3 . 

The algorithm is based on a natural LP relaxation to outlier VRP. Consider a solution with 5C1/ 
as the vertices chosen to be served by r. Then: 



There is an assignment <j) : S — > [F] such that (1) v G t^u) f° r au v £ S; and (2) for each r-tour 
j G [F], the total demand assigned to it X^e</> -1 0') ^ v — ®' 

• The objective value is the optimum VRP on metric (V, d), depot r, capacity Q and demands 
{q v : v G V \ S}. Using known lower-bounds for VRP f22L 0], at the loss of a constant factor, this 
is just MST(y \ S) + Flow(y \ S) where for any TCF, MST(T) = length of minimum spanning 
tree on {r}[jT, and Flow(T) := ^YlveT^v ' d{r,v). 

Thus we can write the following integer programming formulation for outlier VRP, at the loss of 
0(l)-factor. 

min ^ d e ■ z e + — ^ d(r, v ) • q v • (1 - x v ) (2) 

e&E " v£V 

s.t. ^ q v ■ y vJ < Q VjG[F], (3) 

Vv,j = %v Vf G V, (4) 



l>£T.i 



J 



E 

z e > 1 - x v W jf r, Vv G U, (5) 

eeS{U) 

x v ,y vJ e {0,1} VveV, VjG[F], 

z e > Ve G -E. 

Above is one iff v G 5, i.e. served by r. Variables y^j denote the assignment <f> : S — > [F]. 
Constraint (Q) ensures that each u G <S* is assigned to some <j>(v) such that v G t^,^). Constraint (||) 
enforces the total assignment to each r-tour is at most Q. Also E = ( 2 ) denotes the edge-set of the 
metric, and for any U C V, 6(U) denotes the edges with exactly one vertex in U. Constraint (||) says 
that {z e : e G E} is a fractional spanning tree connecting the vertices {v : x v = 0} = V \ S to r. In the 
objective (|2|), the first term is the length of the fractional spanning tree (corresponding to MST(V\<S)), 
and the second term is Flow(y \ S). Dropping the integrality gives us an LP relaxation LP(r, q). We 
can solve this LP in polynomial time, and next we describe a rounding algorithm. 

Observe that the solution from Algorithm [3.2| uses five copies of the fixed tour r, whereas we will 
bound the cost against LP(r, q). 

First we show that our assignment S to the fixed tour is indeed feasible (using 5 copies). It is clear 
that setting a v j = 2 y v j for v G S, j G [F] gives a feasible fractional solution to the restricted assignment 
instance in Step |3] of Algorithm ^2: this follows from constraints (||) and (|j) using {x v > ^} V €S- Thus 



the rounding algorithm from [27] can be employed on a to obtain an integral solution (j) having load at 
most 2Q + max„ q v < 3Q for each j G [F]. Then for each j G [F], we partition 0~ 1 (j) starting with the 
trivial partition into singletons and greedily merging parts as long as each part < Q: this results in at 
most 5 parts. Thus vertices S can be feasibly assigned to 5r. 
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Algorithm 3.2 Algorithm for computing recourse action Aug(r, q). 



Input: (V,d), r, capacity Q, fixed tour r = {ti,...,tf}, outcome 

q- 

1: Let (x, y, z) denote the optimal LP solution. 
2: Set S <- {v £ V : x v > \}. 

3: Consider the following instance of restricted assignment 

Ever^V^vj < 2-Q Vj€[F], 

EjelFy.veTj a v,j = 1 Vt> e S, 

< a vJ < 1 Vu G S, Vj € [F]. 



4: By the rounding algorithm in pffl we can obtain an integral assignment (j) : 5 — > [F] such that 

v G T <K*>) f° r all t; € S and g ((/> _1 (i)) < 3( 5 for a11 J G [F]. 
5: For each j € [F], partition <^>~ 1 (i) = | |^ =1 Sj,l m to at most five parts such that {q(Sjj) < Q}f =1 . 

This can be done by a greedy algorithm. 
6: Output for each j € [F] and I € {1, . . . , 5}, vertices Sjj as served by Tj. 
7: Output an 0(l)-approximate VRP solution Sj on vertices V \ S as recourse tour. 



Next we bound the cost of our recourse tour by 0(1) • LP(r, q). Observe that x v < ^ f° r each 
v € V \ S: so constraint (||) implies that 2 ■ z is a fractional spanning tree on {r} U (V \ S). Hence 
MSJ(V \ 5) < 4 (d • z). Moreover, it is clear that Flow(y \ S) < 2 ■ £ X^»; G y d(r, u) • • (1 - x v ). Thus 
the LP objective: 

LP(r,5) > i-MST(F\5) + i.Flow(^\5). 

Since the VRP algorithm (on demands V \ <S) achieves a constant approximation relative to these lower 
bounds, it follows that the recourse cost is 0(1) • LP(r, q). 

Theorem 3.5 There is an (O(l), 5) -bicriteria approximation algorithm for outlier VRP, that uses the 
fixed tour at most five times. 



4 Algorithm for Ratio Knapsack Rank-function Orienteering 

In this section we give an improved result for the ratio version of submodular orienteering, when the 
objective is a sum of "knapsack rank- functions" . This can be used as a subroutine for StochVRP to 



yield Theorem 1.1 



An instance of the knapsack rank-function orienteering problem (KnapRankOrient) consists of metric 
(V, d) with root r and length bound B. The objective is a sum of m knapsack rank-functions fx, ■ ■ ■ ,f m ■ 
2 V — > For a solution visiting vertices U, the objective value is Y^hLi fi(U)- The goal is to find an 
r-tour of length at most B having maximum objective. Each knapsack rank-function /j is: 

fi(U) := maxj]Tu4 : 5C^J^4<ll, Vt/cy 

Above w 1 : V — > M+ and c l : V — > [0, 1] denote profits and sizes at the vertices; so fi(U) is the maximum 
profit in any subset of U having size at most one. (Although a knapsack rank-function may not be 
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submodular, it can always be approximated within factor two by a submodular function, as discussed 
in the end of Section ||. ) 

Here we consider the ratio version of this problem, called ratio KnapRankOrient, where given metric 
(V, d) with root r and knapsack rank- functions fi, ■ ■ ■ ,f m , the goal is: 

max < — — : r is an r-tour 

I d ( T ) 

Above V(t) C V are the vertices visited by r and d(r) is its length. 

Note that the max-coverage problem for first stage sets corresponds exactly to ratio KnapRankOrient. 
Here we obtain an 0(log 2 n)-approximation algorithm for ratio KnapRankOrient, which combined with 
the algorithm in Section || implies Theorem |1.1| . 

Related Work. KnapRankOrient is closely related to the group Steiner tree problem, where given 
metric (V,d) with root r and m groups G\, . . . ,G m C V of vertices, the goal is to find a minimum 
length tree connecting r to at least one vertex of each group. The best known approximation ratio for 
this problem is 0(log 2 n • logm) fl7]| . Formally, ratio KnapRankOrient generalizes the "density group 
Steiner" problem [0] , which involves finding an r-tour maximizing the ratio of number of groups covered 
to its length: setting w l v = c* = l[v £ Gj\ in ratio KnapRankOrient gives us density group Steiner. Our 
algorithm builds on B; however the previous algorithm is not directly applicable since the natural LP 
relaxation to ratio KnapRankOrient seems weak. We strengthen the LP relaxation for KnapRankOrient 
by adding extra constraints (see constraints ( |Tl[) below), that are motivated from the related covering 



Steiner tree problem [26|. Moreover, as with all LP-based algorithms for group Steiner type problems, 



the main rounding step is the dependent randomized rounding (called GKR rounding below) from [17 



Even with a good LP relaxation and rounding step, previously used analysis such as |17|, 26, £l| is 
inadequate for bounding the profit in ratio KnapRankOrient as shown next. 

Example 1: Consider a star-like tree with a single edge (r, u) from the root and t other edges (u,vi), . . . , (u, 
all of unit length. There is a single knapsack with zero sizes (c = 0), and profits w of one at each 
V' = {v\, . . . ,vt} (and zero at r, u). Suppose the LP solution sets value x = 1/t 2 for all edges. The 



fractional profit is \i = 1/t. The analysis in all of [17, 26, 21] attempts to upper bound: 



Pr [number of V- vertices chosen in GKR rounding < fJ,/2], (6) 

In this example, the solution (from GKR rounding) is the entire tree with probability 4-, and empty 
otherwise. So the probability (||) is 1 — jy, which by itself would only imply an expected profit of 
1/t 2 <C /i (although the actual expected profit is t). Such an analysis is sufficient when fi = £1(1) as 
in [jl7| , 5jJ , but we are not guaranteed this in ratio KnapRankOrient. 

Instead of using a bound on the probability (^), we directly lower bound the expected profit using 
a different analysis. In particular our main idea is to use an alteration step after GKR rounding (see 
Lemma 4.1). While alteration has been used with LP-rounding before, eg. p5| , we are not aware of an 



application in context of the group Steiner tree problem; moreover we only use alteration in our analysis 
and not in the algorithm. 



In the next Subsection \j.l[ we present the LP relaxation that we use. In Subsection |4.2| we show 
that the GKR rounding ensures (i) high expected profit and (ii) low expected length (individually). 
Then in Subsection |4.3| we use the derandomization of GKR rounding || , and show that it leads to 
a single (deterministic) solution having a high profit/length ratio. Altogether we obtain an 0(log 2 n)- 
approximation algorithm for ratio KnapRankOrient. 
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4.1 LP relaxation 



At the loss of an O(logn) factor in the approximation ratio, we can assume that the metric is a tree T 
(with edge set E(T)) rooted at r having I = O(logn) levels [16]. We also enumerate over all choices of 



the length B of the optimal ratio KnapRankOrient solution 0Then we use an LP relaxation similar to 
the LP for group Steiner tree |T^| . 

First some notation: For any edge e in T, we denote by 7r(e) its parent edge. Similarly ir(v) is the 
parent edge of any vertex v G V. For root edges e, the x^ie) values are fixed to 1, since the root is 
always part of the solution. For any e G E(T), the subtree below edge e is denoted T e . 



in 



LP{B) = max £ J>< • 4 (7) 

i=i vev 

s.t. x^( e ) > sc e , Ve € T (8) 

4< x 7r(»;), VdGF, ie[m] (9) 

£4 '4 <1, Vi€[m] (10) 

4-4<^e, VeeT, ie[m] (11) 



ee£(T) 

< x,z < 1 



5 , \ 

< - (12) 



Let us show that restricting x and z to integer values gives a valid formulation of KnapRankOrient. In 
the intended integral solution, x e is an indicator denoting whether /not edge e is chosen. Constraints (||) 
ensure monotonicity, that the solution is a subtree rooted at r. Constraints (^) bound the length of 
the subtree by B/2 (so the corresponding Euler tour has length at most B as required). Vertex v is 
visited by the solution iff x w r v \ = 1; let U = {v € V : x^t v \ = 1} denote the vertices visited. For each 
knapsack rank function i E [m], variables z l denote its maximum profit subset Si = {v G V : z l v = 1}. 
By (||), Si C U as required. Moreover ([[(j) ensures that the total size of 5j (in the z*' 1 knapsack) is at 
most one. Finally, the objective (S) is the sum of profits from each knapsack. 

Although we do not need constraint (|ll]) to show a valid integer programming formulation, it is 



crucial in the rounding step. A similar constraint was used in [26| for the related covering Steiner 
problem. Notice that it indeed holds for integral solutions, so the resulting LP is a valid relaxation of 
KnapRankOrient. For a fractional solution, (|TT|) says that even conditional on edge e being chosen, the 
total size (in knapsack i) from subtree T e is at most one. 

Algorithm Overview. For each estimate B, we solve the above LP(B) and apply the deterministic 



rounding algorithm in Subsection 4.3, which guarantees a solution having profit/length ratio at least 
0(4) • LP g ■ Finally we output the best ratio solution amongst all Bs. Note, if B* denotes the length 
of the optimal ratio KnapRankOrient solution then LP(B*)/B* is at least the optimum ratio. So with 
B ~ B* we obtain an O (^-approximation to ratio KnapRankOrient. 



It suffices to know the iength up to a constant factor; so there are oniy poiynomiaily many choices. 
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4.2 Expectation guarantee in KnapRankOrient 

Here we show that the natural GKR rounding step produces a solution having expected length at most B 
and expected profit at least Cl(LP(B) /£) . Note that this does not bound the expectation of profit /length. 
Still, this property is used by the algorithm in the next subsection to produce a deterministic solution 
to ratio KnapRankOrient having value • ■ 

Algorithm 4.1 LP rounding for KnapRankOrient 

1: Solve the LP relaxation LP(B) for KnapRankOrient to obtain (x,z). 

2: Perform the (dependent) rounding from []l7j, i.e. choose each edge e G E(T) independently with 
probability Xe and retain only subtree F CT connected to r. 

3: for i G [m] do 

4: For each vertex v G V(F) choose v into Si independently with probability z " . 

5: If j(Si) > U then set Ri «- 0, else R4 <- Si. 
6: output r-tour corresponding to F. 



In the above algorithm we assume that if e is a root edge then x^u) = 1, since the root is always a 
part of the solution. Steps [2] and [| are the GKR rounding, and Step || is the alteration step. It is clear 
that in Step |2| we have: Pr[e G F] = x e for each edge e, and so Pr[i> G F] 
Note that the expected length of F is: 



x 



ir(v) 



for each vertex v. 



E 



feF 



< 



B 
~2 



(13) 



So taking an Euler tour of F, the expected solution length is at most B. It remains to bound the expected 
profit. Notice that for each knapsack i € [m], we have Ri C F and c l {Ri) < Ai with probability one, 
from Step |5[ So a greedy partitioning of Ri yields 8^ parts each of size at most one; and by averaging 
some part has profit at least w l (Ri) / (8£) . Thus: 



E 



i=l 



-. m 1 m 



(14) 



t=i 



i=l f 



We bound Pr[v € in Lemma 4.1 below. Before doing that, we introduce some notation that will 
also be useful in the subsequent derandomization step. For any i G [m] and u, v £ V let P v denote the 
indicator of the event u v € Si"; and let J* v denote the indicator of event "u G and v G ,Sj". Also for 
i G [m] and u G V let J* indicate whether u v G Due to Step |H| it is clear that 



4 > 4-^E<-^' V*e[m]and«eV. 



(15) 



Lemma 4.1 For each i G [m] and v G V, 

Pr [v G = 



E[JJ] > E[P V ] - 1 ^ 4 • E[iy > ^. 



Proof: Observe that Pr[w G 5j] = E[/*] = z l v . Let us now condition on J* = 1, i.e. {v G 5^}. 
Let (ei, . . . ,ei) denote the edges on the path from r to v; clearly all these edges are in F. For any 
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j € {1, ...,£} define T'- C T e . \ {v} to be those vertices whose least common ancestor with vertex v is 
edge ej. Also define Tq to be the vertices whose least common ancestor with vertex v is the root r; set 
x eo = 1. For any j € {0, 1, ... , £} and u € Tj notice that: 



E[J*|4 = 1] = Pr[u £ Si\v e Si] = Pr[u £ S l \e j e F' 
Taking expectations, using (|ll|) and T 1 - C T ej we have for each j £ [£], 



< i 

r — ^ 



3 



Observe that lK =0 T' = V \ {v}. So summing the above, 



x r 



E ct-E[4i4 = i; 



E E 4-e[4I4 = i] < * + i 



(16) 



j=o ugt; 



By Inequality (15) which is due to the alteration Step 

e[j;i4 = i] > i4'E4-E[tl4 = i] = i-^-E c «- E ^l^ = 1 ] ^ 1 



u&V 



1 

4£ - 4 



The second last inequality is by (|lq ) and the fact that c l v < 1; the last inequality uses £ > 1. The lemma 
now follows since E[J*] = z l v . ■ 
Combining ( |i"4|) and Lemma 4J we have: 



E 



E^) 



i=i 



> 



7TL / \ T71 

i=l V \ u£V ) i=l v 



Notice that ( |l3|) and ([l7| ) bound the expected length and profit respectively. This is not sufficient 
for the ratio KnapRankOrient problem. It would suffice to bound the length and profit simultaneously 
(instead of expectation). But this is not possible: Consider the instance in Example 1 and the fractional 
solution with x e = 1/t for all edges. This is feasible to the above LP with B ~ 2, and has profit 
LP(B) > 1. In this example, Algorithm |4.1| produces the following integral solution: the entire tree 
(profit = length = t), with probability 1/t, and the empty tree (profit = length = 0) otherwise. Neither 
of these solutions satisfies bounds on both profit and length. 

Instead, we show that one can obtain a solution of high ratio "profit/length" by derandomizing 



Algorithm 4.1. This deterministic algorithm uses pessimistic estimators and is similar to M; however 



the details are quite different since we analyze a different random process. 
4.3 Deterministic Algorithm for Ratio KnapRankOrient 



For the randomized algorithm [44] recall the indicator variables I*s and „s. Also let Kf for any edge 
/ £ E(T) denote the indicator of event "/ € F" . Define the following estimators for profit and length: 



p = EE<- U- s E4-4,« 

i=i vev V uev / 



and D : = df ■ Kf 

/6fi(T) 



(18) 
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We have E[P]/E[D] > EZl52v< ' z ll^ B ) = LP ( B )/(^ B ) h Y (0) and (gg) which is our ini- 
tial estimate of the ratio. We will inspect edges e of T one at a time and decide (deterministically) 
whether/not to include e so that the ratio estimate does not decrease. At the end of the algorithm, we 
obtain a deterministic subtree F* with ratio at least E[P]/E[L>]. Details now follow. 

To keep notation simple, we extend tree T slightly. For each v add leaves {(i,v) :i£ [m]} that 
are adjacent to v (with zero length edges). Let Li = {(i,v) : v € V} denote the leaves corresponding 
to knapsack i € [m]. Define the following fractional values y* on edges according to the optimal LP 
solution (x, z). 

x e if e is an edge in the original tree, 
z l v if e = (v, (i,v)) is a new leaf-edge. 

Notice that the GKR rounding steps [2] and |I| in Algorithm LI correspond to choosing each edge e 
in the modified tree independently w.p. yl/y*^ and retaining the subtree F connected to r. (Recall 

that 7r maps each edge/vertex to its parent edge.) Moreover, for each i £ [m] subset Si = V(F) n Lj. 

In our algorithm we will be dealing with trees T (with vertex set V(T) and edges set E(T)) derived 
from T and edge- weights y (on T) derived from y*. We will always have the property that y is non- 
increasing from the root r to any leaf. For any tree T and edge-weights y, define: 

P(y,T):=jr £ <-L ( * £ ^ j and D(2/ T) := ^ d/ . y/( i9) 

i=i «ev(T)nLi \ «ev(T)nL i i/e(tt,u) y feE(T) 

where 6(u,v) denotes the least-common-ancestor edge of vertices u and For ease of notation, we 
assume WLOG that there is always a dummy edge above the root r having y-value one. Observe that 
these values correspond precisely to the expectation of random variables P and D from (|l8|), in tree T 
when GKR rounding is performed with edge-values y. 



Lemma 4.2 At any iteration in Algorithm ^. after Step 9, we have P(y, TUF) = y e -P(y', T')+(l—y e )- 
P(y",T")=y e -P 1 + (l-y e )-P and D(y,TUF) = y e -D(y',T') + (l-y e )-D(y",T") = y e -D 1 + (l-y e )-D . 



Proof: Consider first the equation for D. We have: 



D( y ',T') = d e + d f+ E d s-yf+ E d ff 
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Algorithm 4.2 Deterministic algorithm for Ratio KnapRankOrient 



1: Solve the LP relaxation for KnapRankOrient to obtain (x, z). 
2: Extend tree T by adding leaves U^Lj and define y* as above. 
3: Initialize tree P <— {r} and y <— y*. 
4: while there is edge e G T incident to r do 

5: Let T 2 = T e denote the subtree below e and T\ = T\T 2 \ {e}; see Figure |l[ 
6: Set T" <- Ti U P, and 

„ f y f if fen 
y f^\i if / g P 

7: Compute estimates Po = P(T" , y") and Dq = D(T", y") upon excluding e by ([19] 
8: Set T <r- (T contract e) U (F U {e», and 

r £ if/GT 2 U{e} 

i//<- < y/ if /eli 

I 1 if / € F 



9 
10 

11 

12 
13 
14 



Compute estimates Pi = P(T', y') and Z?i = D(T', y') upon including e by (IS), 
if #l > -£l then 

Set T <— T\ and y «— y". 
else 

Set T <— (T contract e) and y <— y' . Also F<-FU {e} and y e <— 1. 
output r-tour corresponding to F. 



D( y ",T")= E d /+ E d f-yf- 

feE(F) /e-B(Ti) 

It follows that in the convex combination y e • D(y',T') + (1 — y e ) • D(y" ,T"), each edge / € E(F) 
contributes df and each edge g in T = Ti U T 2 U {e} contributes <i 9 • y ff . This is exactly D(y,T U P). 

Next consider the equation for P. We will show equality term-by-term in the expression ( |i~9| ) for P. 

Consider the first summation P a (y, T) := 2^=1 X^evennLj w v'Vit{v)- We show that the contribution 
of any term "i € [m] and t> G Lj" to P a (y, T U P) is the same as to y e • P a (y', T') + (1 — y e ) ■ P a (y", T"). 



Cases 


P a (y,TUF) 


Pa(y',T>) 


P a (y",T") 


y e -P(y',T') + (l-ye)-P(y",n 


ir(v) G P(P) 


K 


< 


wi 


if/ 


7r(u) € P(Ti) 




^ • 2/V(v) 




• y vr („) 


7r(u) G P(T 2 ) U {e} 


K ■ y n ( v ) 


i Vn(v) 

W ■ — — 

V Ve 





wl ■ y 7r (i,) 



Next consider the second summation P 6 (y, T) := YT=i J2u,v€V(T)nL, ^'^'^S^f 1 - We a S ain 

show equality Pfe(y, T U P) = y e • Pb(y', T') + (1 — y e ) • Pb(y" , T") for each term ^ 7r( " ) y,r( " ) corresponding 

y6(u,v) 

to "i G [m] and u,v £ Li"; to reduce clutter we drop the multiplier —jiw l v - c % u . 
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Cases 


P b (y,TUF) 


P b (y',T') 


P b (y",T") 


y e -P b (y',T) 

/ -4 \ i -x / it mll\ 

+(l-y e )-P b (y",T") 


ir{u),ir{v) G F[t ) 


1 


-i 

i 


-1 

1 


1 


7T{u) G rj\r J, tt(v) G rjyl\) 


Vtt{v) 


v-k{v) 


yn(v) 


yw(v) 






Ve 


n 

u 


2/tt(«) 


TV\U), 7V\U) t -G/^]J 






2/7r(u)'2/7r(u) 




VB(u,v) 


ye(u,v) 


Ve(u,v) 




7r(«),7r(«) G£(T 2 )U{e} 


y-!r{u)-yTr{v) 







y^(uyy-K(v) 




ye(u,v)'ye 


ye(u,v) 


tt(u) G S(Ti), 7r(«) G E(T 2 ) U {e} 









yn(u) ' yir(v) 



Since we have checked all cases, it follows that P(y, T U F) = y e ■ P(y' , T") + (1 — y e ) ■ P(y", T"). 
This lemma implies that in any iteration, 

f P(y',T>) P(y",T") \ ^ ye .P(y',T') + (l-y e ).P(y",T") _ P(y,TUF) 



max 



\D(y',T'Y D(y",T") 



> 



y e ■ D(y>, T') + (1 - y e ) ■ D(y", T") D(y, T U F) 



So by induction, the ratio ^(^uf) ^ s n on-decreasing over iterations. It can be seen that the denominator 
D(y,TD F) can always be taken to be non-zero and therefore the final solution F must contain at least 



one edge. Notice that at the start of Algorithm 4.2, F is empty and the ratio is at least p = LP(B) /(4B) 
by ([TrD and (|l~3|). And at the end of the algorithm, the tree T is empty- so the ratio is exactly: 



E E 



w; 



E 




/ 



> 



(20) 



E d f 

j=i ve v(F)nLi \ *" ueV(F)nLi ) ) \feE(F) 

The denominator is exactly the length of solution tree F. We will show that the numerator is at most 
0(£) times the profit fi(V(P)) °f P- This would imply that the profit/length ratio of solution F 

is at least fi(^) " P a $ desired. 

To upper bound the numerator in (|20|), define 



Pi 



LinV(F) if d (Li nV(F))< 
otherwise. 



Vi G [m] 



Note that if R4 = then c*(Lj n V^i* 1 )) > 4£, i.e. the contribution of Lj n V(F) in the numerator 
of (|20| ) is negative. On the other hand, if i?j 7^ then the contribution of Lj n V(F) is at most 
w l (Li n ^(i 7 )) = w l (Ri). So we obtain that the numerator of (|^) is at most YlT=i w t (Ri). Since each 
i?j has knapsack-size at most 41, a greedy partitioning as before implies there is a subset R[ C i?^ with 
size diR'i) < 1 and tu*(JBJ) > w i (R i )/(8£); i.e. > w i (R i )/(8£). Rearranging, the numerator 

of ( p0[ ) is at most 8£ YliLi fi(Y(P))- Combined with the inequality in (pOp, 

^ J_ LP(B) 
32 £ 



Ratio of solution F 



d(V(F)) 



> 



B 



Thus we have proved: 



Theorem 4.3 There is a deterministic 0(£)- approximation algorithm for the ratio knapsack orienteer- 
ing problem on depth £ trees. On general metrics there is an O (log 2 n)- approximation algorithm. 

The additional log-factor on general metrics is due to tree embedding which is randomized. This 
step can also be made deterministic using the algorithm in H. 
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5 UGC Hardness of Approximation 



In this section we prove a w(l) UGC-hardness of approximation for StochVRP even for a very simple 
star-like metric with a setting of A that renders the recourse tour trivial. Our hardness result is based 



on the Unique Games Conjecture (UGC) of Khot [25] , a restatement of which is given below 



Conjecture 5.1 (Unique Games Conjecture j(2q]) For any e > 0, there is a positive integer p such 
that: given a system of 2-variable linear equations over Z p , each of the form X\ — Xj = mod p, it 
is NP-hard to distinguish between the following two cases : (i) YES CASE: There is an assignment to 
the variables that satisfies 1 — e fraction of equations, (ii) NO CASE: Any assigment satisfies at most 
e fraction of equations. 

Based on UGC, Bansal and Khot || proved the following hardness of approximation result for mini- 
mum vertex cover on almost fc-partite ^-uniform hypergraphs, which shall be the starting point of our 
reduction. 

Theorem 5.2 Assuming the Unique Games Conjecture, for any e > and positive integer k > 2, 
given a k-uniform hypergraph G with vertex set U and hyperedge set E, it is NP-hard to distinguish 
between the following two cases: 

YES CASE: There is a partition of U into k + l disjoint subsets X,U±, . . . ,U)~ such that \X\ < e\U\ and 
the hypergraph induced by U \ X (consisting of vertex set U \ X and hyperedge set {e D (U \ X) | e 6 
E, \e fl (U \ X)\ > 0}) is k-partite with U%, . . . , £/& as the k-partition. That is, any hyperedge e € E has 
at most one vertex from any Ui. This implies that X U Ui is a vertex cover in G for each i = 1, . . . , k, 
and that the minimum vertex cover in G has size at most (1/k + e)|£/|. 

NO CASE: The size of the maximum independent set in G is at most e\U\, and therefore the size of the 
minimum vertex cover in G is at least (1 — 

In the rest of this section we shall give a hardness reduction from the problem of distinguishing between 
fe-uniform hypergrahs which are almost fc-partite (as in the YES case of Theorem [T^) from those that 
have a very small maximum independent set (as in the NO case of Theorem |5.2|) . 



5.1 Hardness Reduction 

Fix any positive integer k > 2. Let us suppose we are given a fc-uniform hypergraph G on vertex set 



U and with hyperedge set fiasa hard instance from Theorem 5.2, where we shall fix the parameter e 



in Theorem |5.2| later. We transform G(U, E) into an instance of StochVRP as follows. For clarity, in 
this section the nomenclature of "vertices" shall be in context of the hypergraph, while "points" shall 
be used for corresponding elements in the metric. 

Metric (V,d). The set of points V in the metric is U U {r}, where r is the root. The distances d are 
defined as follows. Let d(r,u) = L, where L = (\U\/2k + 1/2), for all u S U. Further, for each pair 
u,u' £ U , u 7^ u' , let d(u, v!) = 1. It is easy to see that d is a metric. This simple metric can be realized 
by the shortest paths in a star-like tree of distances as illustrated in Figure ^. 

Capacity and Demands. The capacity Q = 1 and demands will be {0, 1}. 

Demand Distribution V. There are polynomially many scenarios m = \E\, each having uniform proba- 
bility. Every hyperedge e € E is a scenario having demand of one at all points in e, and zero demand 
elsewhere. 
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Figure 2: Tree of distances realizing metric d, with intermediate point x and V = {r, u±, . . . ,u n }. 



Parameter A. We set A = 2m\U\(k + 1). 

Before we proceed to the analysis of this reduction, we note that the cost of the minimum cost r-tour 
covering points S C V\{r}, is simply \U\/k + \S\. Also, the optimal value is at most A/m. Consider the 
fixed tour consisting of k identical r-tours each covering U: since each scenario has at most k demands, 
this solution never uses a recourse tour, and has cost k ■ (\U\/k + \U\) < A/m. So we may assume that 
the optimal solution has no recourse tour: if the recourse tour is non-empty in any scenario then its cost 
is at least A/m. 



5.2 Analysis 

We now give the analysis. 



YES Case. Suppose that G(U, E) is a YES instance of Theorem 5.2 with X, U\, . . . , U}. as the partition 
of U with the properties as stated in the theorem. Consider the r-tours n,..., Tfc, where Tj is an 
r-tour that covers points X Li Ui (in addition to r). Since every scenario in our instance of StochVRP 
corresponds to a hyperedge in G, using the property in the YES case that each hyperedge has at most 
one vertex from each Ui, we see that the r-tours t±, . . . , tj. satisfy all the scenarios. As noted earlier 
the cost of each r-tour that covers Scy \ {r} is \U\/k + \S\. Therefore the total cost of the k r-tours 
ti, . . . ,Tfe is, 

k 

k-{\U\/k) + ^\X\JUi\ < |E/1 + (l + fc)|l7| = (2 + te)|l7|, 
by the properties of the partition X, U\, . . . , U}. of U. 



NO Case. Suppose that G(U, E) is a NO instance of Theorem |5j|, so that the maximum independent 
set in G is of size at most e\U\. In this case we shall prove that the total cost of any set of r-tours that 
satisfy all scenarios is at least k(l — fk(e))\U\, where /fc(e) — > as e — > for any fixed positive integer 
k > 2. We may assume that the number of r-tours in the optimal solution is at most k 2 , otherwise 
the total cost will be at least k 2 (\U\/k) = k\U\ and we shall be done. Therefore, let 71, . . . ,7t be the 
r-tours in an optimal fixed tour, where T < k 2 . We shall estimate the number of points in U which 
occur in at most k — 1 of these r-tours. For any subset / C [T], let U{I) C U be the points which do 
not occur in {7^ : i € [T] \ I}. We have the following simple lemma. 

Lemma 5.3 For any I C [T] with \I\ = k — 1, U(I) is an independent set in G. 
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Proof: For a contradiction, suppose that e is a hyperedge induced by U(I). Since |e| = k, the scenario 
corresponding to e will not be satisfied by our solution as the k vertices of e appear (as points) in at 
most k — 1 of the r-tours, namely those given by I C [T]. Recall that each r-tour can serve only one 
demand. ■ 
The total number of points in U that appear in at most k — 1 of the r-tours is upper bounded by, 

IC[T],\I\=k-l 

There are (J^j) < 2 r < 2 fc2 choices for the subsets I in the above expression. Using the fact that any 
independent set in G has size at most e\U\, the fraction of points in U that occur in at most k — 1 of 
the r-tours is at most e2 k =: /fe(e). Each of the remaining (1 — fj t (e))\U\ points appears in at least k 
of the r-tours; so the total cost of the fixed tour is k(l — fk(e))\U\. 

Hardness Factor. In the YES case there is a solution of cost at most (2 + fc)|£/|, whereas in the NO 
case any solution has cost at least k(l — fk(e))\U\. For any positive integer k > 2 and arbitrarily small 
5 > 0, choosing e > to be small enough in Theorem |5.2] , we obtain a hardness factor of k/2 — 5. 



6 Independent Demand Distributions 

In this section we give an 0(log(nA)/ loglog(nA))-approximation for StochVRP under independent dis- 
tributions T>. That is, the demand q v at each vertex v is independent of all other vertices V \ {v}. The 
main idea is to show the existence of a near-optimal solution that partitions vertices into two disjoint 
sets D\ and D2 such that: vertices D\ are served by the fixed tour w.h.p., and vertices D2 are served in 
the recourse tour. This step (Lemma |6.1| ) uses independence. Then we show how an LP-based approach 
(combined with sampling) yields a constant approximation to the problem of choosing the best partition 
(Di^Di). For any v € V let fj, v := K\q v ] the expected demand at v; note max„ e y \i v < Q the vehicle 
capacity. We assume by scaling that the optimal value opt > 1. 

Lemma 6.1 Given any instance of StochVRP with independent demands, there exists partition D\ U 
D2 = V and StochVRP solution with fixed tour r such that: 

• The total expected demand in each r-tour of r is at most Q. 

• The length of r is 0(log(nA)/loglog(nA)) • opt. 

• r does not visit any D^-vertex; i.e. each u € D2 is served in recourse tour. 

• Each v G D\ is served by r with probability at least 1 — l/(nA) 4 . 

• The recourse cost is at most opt + 1. 

Proof: Consider an optimal fixed tour r*. Let D\ C V denote the vertices visited at least once in r*; 
note that each vertex might be visited multiple times. Clearly the minimum spanning tree on vertices 
Di U {r}, MST(L>i) < d(r*) < opt. Using the "flow lower bound" in VRP ||] it is also clear that: 

opt > — ^ d(r, v) ■ fly > — ^ d(r, v) ■ ji v = F\ow(Dx) 

Recall that each \i v < Q. Thus if we consider a deterministic VRP instance with demands {//„ : v G D\} 
and capacity Q, then it has a solution r' of length at most O(l) • (MST(Di) + Flow(L>i)) by [p2| , |2|. 
From the above, we have d(r') < 0(1) ■ opt. Let r{, . . . ,r/ denote the r-tours in r', each having total 
fi- value at most Q. We define the fixed tour r to consist of /3 := c ■ ^^^^ copies of r', where c is 
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a large enough constant. The first three properties of r are immediate. For any r-tour t-, if the total 
instantiated demand YIv&t' Qv ^ P ' Q then all these demands can be served by r since it contains j3 
copies of t[. Thus the probability that some v G D\ (say with v G t[) is not served by r is: 



~Pr[v not covered by r] 



< 



Pr 



by a Chernoff bound [28| using the fact that (3 = G 



v&t' 



< 



(nA) 4 ' 



logiog(nA) J ■ This proves the fourth property. For 
the final property, note that vertices D2 that are never served by the optimal fixed tour r*. So the 
expected VRP value on D2 (scaled by A) is at most opt. This is the recourse cost that our solution 
corresponding to r pays for D2. In addition, some Di-vertices may be uncovered in r and the recourse 
cost due to these is at most: 



Pt[v not covered by r] 



< 



2n\D 



< 



1, 



where we used the fact that diameter D = 0(n ) from Subsection 3.1. So the total expected recourse 
cost is at most opt + 1 as claimed. ■ 

We now find an approximately optimal solution to independent StochVRP that has the above struc- 
ture. We write an IP formulation to capture the partition (D\,D2). For v G V let x v G {0, 1} denote 
the indicator that v G D\. By Lemma 6T the fixed tour r corresponds to a deterministic VRP solution 
with demands {[i v ■ x v : v G V}. Using MST and flow bounds (as in Section ||) we can express this 
(losing a constant factor) via linear constraints in x. 

We also need to write the expected VRP value (scaled by A) due to demands Z?2- This involves 
the expected VRP value (equivalently EMST + EFlow) of the random instance where each v € V has 
an independent demand of (1 — x v ) -q v ; recall that q v denotes v's demand in the original StochVRP 
instance. The expectation EFlow is just ^ X^eF ^( r ' v ) ' ^v{^ — x v ). Unfortunately it is not clear if one 
can write linear constraints (in x) for the expectation of MST: this involves the expected MST value 
when each v € V is present independently with probability (1 — x v ) ■ Pr [<!/,„ > 0]. Instead we show that 
sampling can be used to estimate EMST within small error, and that the sample expectation can be 
expressed via linear constraints in x. 

For any x € {0, 1}^ define T(x) := E[MST(S' :r )] where S x contains each vertex v € V independently 
w.p. (1 — x v ) ■ p v where p v := Pr[g„ > 0]. We now use the result of |l(| as in Theorem 3T. We make 
m = poly(n, A) independent samples S 1 , . . . , S m C V according to {p v } v <=v and set 



^ m 

f(x) := — MST ({v G S i : x v = 0}) . 



i=l 



Then we have \T(x) — T(x)\ < 1 for all x G {0, 1}^ with probability 1 — o(l). 
We now write the following integer program for finding partition (Di, D2). 



mm 



, \- d(r, v) fi v 



\d(r,v) fj, v 



A 



s.t. z e >x v VRCV\ {r}, Vv G R, 



Q 



i-x v ) + — V 

m z — ' — ' 



i=l eG-E 
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4 > 1 - x v Vi G [m], VRCV \ {r}, Vu € i? n S\ 

e6<5(_R) 

x„ G {0, 1} G V, 

Ze, 4>0 VeG^,Vi€[m]. 

The last term in the objective captures T{x). Based on the preceding discussion, w.h.p. this inte- 
ger program expresses the objective of all partitions {D\,D2) up to a constant factor. Relaxing the 
integrality on x we obtain an LP relaxation that can be solved in polynomial time. The rounding 
algorithm simply chooses D\ = {v G V : x v > |} and D2 = V \ D\. We output the fixed tour r to 
be 0(log(nA)/loglog(nA)) copies of an approximate VRP on demands {/i v : v G D{\. The recourse 
step involves greedily satisfying the instantiated demands on r, and then computing an approximate 



VRP solution on the residual demands. Using Lemma S.l it is easy to show that this achieves an 



0(log(riA)/ loglog(?7,A))-approximation, i.e. Theorem 1.4 
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